A Scalable XSLT Processing Framework based on MapReduce

نویسندگان

  • Ren Li
  • Jianhua Luo
  • Dan Yang
  • Haibo Hu
  • Ling Chen
چکیده

The eXtensible Stylesheet Language Transformation (XSLT) is a de-facto standard for XML data transforming and extracting. Efficient processing of large amounts of XML data brings challenges to conventional XSLT processors, which are designed to run in a single machine context. To solve these data-intensive problems, MapReduce paradigm in the cloud computing domain has received a comprehensive attention in both academia and IT industry recently. In this paper, a novel MapReduce-based XSLT distributed processing framework named CloudXSLT is proposed to implement efficient and scalable XML data transforming. First, the architecture of CloudXSLT framework is outlined. Subsequently, several XML data and XSLT rule representation models which are suitable for MapReduce paradigm are defined, and several MapReduce-based XSLT distributed processing algorithms are proposed. Finally, an experiment on a simulation environment with real XML datasets shows our framework is more efficient and scalable than conventional XSLT processors when processing large size of XML data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ETLMR: A Highly Scalable Dimensional ETL Framework Based on MapReduce

Extract-Transform-Load (ETL) flows periodically populate data warehouses (DWs) with data from different source systems. An increasing challenge for ETL flows is processing huge volumes of data quickly. MapReduce is establishing itself as the de-facto standard for large-scale data-intensive processing. However, MapReduce lacks support for high-level ETL specific constructs, resulting in low ETL ...

متن کامل

On-line Calibration of Semiconductor Gas Sensors Based on Prediction Model

REGULAR PAPERS A Novel Thermal Measurement for Heart Rate Bin Jing and Haiyun Li An Optimal Memory BISR Implementation Maddu Karunaratne and Bejoy Oomann A Scalable XSLT Processing Framework based on MapReduce Ren Li, Jianhua Luo, Dan Yang, Haibo Hu, and Ling Chen Arc Efficiency Assisted Finite Element Model for Predicting Residual Stress of TIG Welded Sheet Kuang-Hung Tseng and Jie-Meng Huang ...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

A Scalable RDF Data Processing Framework based on Pig and Hadoop

In order to effectively handle the growing amount of available RDF data, scalable and flexible RDF data processing frameworks are needed. While emerging technologies for Big Data, such as Hadoop-based systems that take advantages of scalable and fault-tolerant distributed processing, based on Google’s distributed file system and MapReduce parallel model, have become available, there are still m...

متن کامل

Scalable community detection in massive social networks using MapReduce

detection in massive social networks using MapReduce J. Shi W. Xue W. Wang Y. Zhang B. Yang J. Li In this paper, we present a community-detection solution for massive-scale social networks using MapReduce, a parallel programming framework. We use a similarity metric to model the community probability, and the model is designed to be parallelizable and scalable in the MapReduce framework. More i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCP

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013